Parameterized Pattern Matching - Succinctly
نویسندگان
چکیده
The fields of succinct data structures and compressed text indexing have seen quite a bit of progress over the last 15 years. An important achievement, primarily using techniques based on the Burrows-Wheeler Transform (BWT), was obtaining the full functionality of suffix tree in the optimal number of bits. A crucial property that allows the use of BWT for designing compressed indexes is order-preserving suffix links. Specifically, the relative ordering between two suffixes in the subtree of an internal node is same as that of the suffixes obtained by truncating the first character of the two suffixes. Unfortunately, in many variants of the text-indexing problem, this property does not hold, for e.g., parameterized pattern matching, 2D pattern matching, etc. Consequently, the compressed indexes based on BWT do not directly apply. Furthermore, a compressed index for any of these variants has been elusive throughout the advancement of the field of succinct data structures. We achieve a positive breakthrough on one such problem, namely the Parameterized Pattern Matching problem. In parameterized matching, a pattern matches some location in the text iff there is a oneto-one correspondence between the alphabet symbols of the pattern to those of the text. More specifically, assume that the text T contains n characters from a static alphabet Σs and a parameterized alphabet Σp, where Σs ∩ Σp = ∅ and |Σs ∪ Σp| = σ. A pattern P matches a substring S of T iff the static characters match exactly, and there exists a one-to-one function that renames the parameterized characters in S to that in P . Previous indexing solution [Baker, STOC 1993], known as Parameterized Suffix Tree, requires Θ(n logn) bits of space, and can find all occ occurrences of P in O(|P | log σ + occ) time. In this paper, we present the first succinct index that occupies n logσ+O(n) bits and answers queries in O((|P |+occ · log n) log σ log log σ) time. We also present a compact index that occupies O(n log σ) bits and answers queries in O(|P | log σ + occ · logn) time.
منابع مشابه
Parameterized matching on non-linear structures
The classical pattern matching paradigm is that of seeking occurrences of one string in another, where both strings are drawn from an alphabet set Σ. In the parameterized pattern matching model, a consistent renaming of symbols from Σ is allowed in a match. The parameterized matching paradigm has proven useful in problems in software engineering, computer vision, and other applications. In clas...
متن کاملPosition Heaps for Parameterized Strings
We propose a new indexing structure for parameterized strings, called parameterized position heap. Parameterized position heap is applicable for parameterized pattern matching problem, where the pattern matches a substring of the text if there exists a bijective mapping from the symbols of the pattern to the symbols of the substring. We propose an online construction algorithm of parameterized ...
متن کاملA W[1]-Completeness Result for Generalized Permutation Pattern Matching
The NP-complete Permutation Pattern Matching problem asks whether a permutation P (the pattern) can be matched into a permutation T (the text). A matching is an order-preserving embedding of P into T . In the Generalized Permutation Pattern Matching problem one can additionally enforce that certain adjacent elements in the pattern must be mapped to adjacent elements in the text. This paper stud...
متن کاملParameterized Matching
Two equal length strings s and s, over alphabets Σs and Σs′ , parameterize match if there exists a bijection π : Σs → Σs′ , such that π(s) = s, where π(s) is the renaming of each character of s via π. Parameterized matching is the problem of finding all parameterized matches of a pattern string p in a text t. It was introduced as a model for software duplication detection in software maintenanc...
متن کاملFaster fully compressed pattern matching algorithm for a subclass of straight-line programs
We show an efficient pattern-matching algorithm for strings that are succinctly described in terms of straight-line programs, in which the constants are symbols and the only operation is the concatenation. In this paper, both text T and pattern P are given by straight-line programs T and P. The length of the text T (pattern P , resp.) may grow exponentially with respect to its description size ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1603.07457 شماره
صفحات -
تاریخ انتشار 2016